(wip) feat(tables): in-process Iceberg REST Catalog adapter by mkuchenbecker · Pull Request #607 · linkedin/openhouse

mkuchenbecker · 2026-05-27T20:17:47Z

Summary

Adds an in-process Iceberg REST Catalog facade in front of the existing Tables Service. The new com.linkedin.openhouse.tables.rest package is picked up by the existing TablesSpringApplication component scan — no Spring-app wiring changes required. Any client that speaks the Apache Iceberg REST wire protocol (Spark, Trino, PyIceberg, Flink, …) can now read and write OpenHouse tables without an OpenHouse-specific plugin.

The new package contributes ~960 lines of new Java, zero changes to existing files. It reuses TablesApiHandler, IcebergSnapshotsApiHandler, DatabasesApiHandler, and OpenHouseInternalCatalog via constructor injection. Server-side metadata authorship and the existing two-stage CAS (path-string version check + HouseTables @Version JPA lock) are preserved unchanged — REST clients reach the same OpenHouseInternalTableOperations.doCommit path the existing OpenHouse Java client already uses.

Changes

Client-facing API: new endpoints under `/iceberg/v1/...`

All endpoints accept and return Iceberg's standard REST wire format.

Method	Path	Backed by
`GET`	`/v1/config`	static — echoes `?warehouse=` back as `overrides.prefix`
`GET`	`/v1/{prefix}/namespaces`	`DatabasesApiHandler.getAllDatabases`
`POST`	`/v1/{prefix}/namespaces`	accepted as success (OpenHouse auto-creates on first table)
`GET`	`/v1/{prefix}/namespaces/{namespace}`	existence check via `DatabasesApiHandler`
`HEAD`	`/v1/{prefix}/namespaces/{namespace}`	same
`DELETE`	`/v1/{prefix}/namespaces/{namespace}`	accepted as 204 (no native drop-namespace)
`GET`	`/v1/{prefix}/namespaces/{namespace}/tables`	`TablesApiHandler.searchTables`
`POST`	`/v1/{prefix}/namespaces/{namespace}/tables`	`TablesApiHandler.createTable`
`GET`	`/v1/{prefix}/namespaces/{namespace}/tables/{table}`	`TablesApiHandler.getTable` + `OpenHouseInternalCatalog.loadTable`
`HEAD`	`/v1/{prefix}/namespaces/{namespace}/tables/{table}`	same
`POST`	`/v1/{prefix}/namespaces/{namespace}/tables/{table}` (commit)	`IcebergSnapshotsApiHandler.putIcebergSnapshots` or `TablesApiHandler.updateTable`
`DELETE`	`/v1/{prefix}/namespaces/{namespace}/tables/{table}`	`TablesApiHandler.deleteTable`

The commit endpoint replays the Iceberg requirements + updates payload via MetadataUpdate.applyTo(TableMetadata.Builder), pre-checks each UpdateRequirement, then discriminates: snapshot changes route to IcebergSnapshotsApiHandler.putIcebergSnapshots; metadata-only commits route to TablesApiHandler.updateTable.

New Features

Lets any stock Iceberg REST client (Spark org.apache.iceberg.rest.RESTCatalog, PyIceberg RestCatalog, Trino iceberg-rest connector, Flink) talk to OpenHouse without per-engine catalog code.

A @RestControllerAdvice(basePackages = "com.linkedin.openhouse.tables.rest") maps OpenHouse internal exceptions to Iceberg's wire-format ErrorResponse JSON. The advice is package-scoped so OpenHouse's existing exception handler for the native /v1/databases/... surface is unaffected.

MVP scope notes (intentional)

Single-level namespaces only (depth > 1 → 400). Rejection chosen over flatten-encoding so a future multi-level migration is purely additive (no HDFS path rewrites). Spark, Trino, and PyIceberg all work depth-1 when the warehouse is configured that way.
Out of scope for this PR: views, multi-table transactions, server-side scan planning, credential vending, remote signing. These are simply not advertised — clients gracefully skip them.
No new external dependencies — Iceberg wire types (UpdateTableRequest, LoadTableResponse, ConfigResponse, ErrorResponse, …) come from iceberg-core 1.5.2 already on the Tables Service classpath.

Testing Done

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

End-to-end smoke against the oh-hadoop-spark docker recipe. Stock Iceberg RESTCatalog client; no OpenHouse plugin activated.

```bash
./gradlew :services:tables:bootJar
cd infra/recipes/docker-compose/oh-hadoop-spark
docker compose build openhouse-tables
docker compose up -d
```

Spark session config (catalog oh is stock Iceberg):

```
spark.sql.catalog.oh = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.oh.catalog-impl = org.apache.iceberg.rest.RESTCatalog
spark.sql.catalog.oh.uri = http://openhouse-tables:8080/iceberg/
spark.sql.catalog.oh.token =
spark.sql.catalog.oh.warehouse = oh
```

SQL script executed:

```sql
SHOW NAMESPACES IN oh;
CREATE NAMESPACE IF NOT EXISTS oh.smoke;
DROP TABLE IF EXISTS oh.smoke.t1;
CREATE TABLE oh.smoke.t1 (id bigint, name string) USING iceberg;
SHOW TABLES IN oh.smoke;
INSERT INTO oh.smoke.t1 VALUES (1,'alice'),(2,'bob'),(3,'carol');
SELECT count() FROM oh.smoke.t1;
SELECT * FROM oh.smoke.t1 ORDER BY id;
INSERT INTO oh.smoke.t1 VALUES (4,'dave');
SELECT count() FROM oh.smoke.t1;
SELECT * FROM oh.smoke.t1 ORDER BY id;
DROP TABLE oh.smoke.t1;
SHOW TABLES IN oh.smoke;
```

Result (trimmed):

```
smoke t1
Time taken: 0.595 seconds, Fetched 1 row(s)
Time taken: 6.497 seconds
3
1 alice
2 bob
3 carol
Time taken: 1.093 seconds, Fetched 3 row(s)
Time taken: 2.57 seconds
4
1 alice
2 bob
3 carol
4 dave
Time taken: 0.351 seconds, Fetched 4 row(s)
```

All commands succeed end-to-end. Spark uses the stock Iceberg RESTCatalog; the adapter translates each request, delegates to existing OpenHouse handlers, and translates the response back. The new metadata.json files are written server-side by OpenHouseInternalTableOperations.doCommit (unchanged), and Spark reads them back via the standard Iceberg path.

Follow-up work intentionally not in this PR:

Unit + Spring @WebMvcTest coverage for each controller
@SpringBootTest with the upstream RESTCatalog Java client against an H2 Tables Service
CI smoke that runs the docker SQL round-trip on every PR
Configuration knob to enable/disable the adapter (currently always on)
Credential vending (LoadTableResponse.storage-credentials)
Multi-table transactions, views, multi-level namespaces

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

No breaking changes. The new package is additive; the existing /v1/databases/... surface is untouched. The new @RestControllerAdvice is scoped via basePackages = "com.linkedin.openhouse.tables.rest" so OpenHouse's existing exception handler keeps owning everything else.

Adds an in-process Iceberg REST Catalog facade in front of the Tables Service. The new `com.linkedin.openhouse.tables.rest` package is picked up by the existing `TablesSpringApplication` component scan; no Spring-app wiring changes are required. Endpoints (all under `/iceberg/v1/...`): GET /v1/config GET /v1/{prefix}/namespaces POST /v1/{prefix}/namespaces GET /v1/{prefix}/namespaces/{namespace} HEAD /v1/{prefix}/namespaces/{namespace} DELETE /v1/{prefix}/namespaces/{namespace} GET /v1/{prefix}/namespaces/{namespace}/tables POST /v1/{prefix}/namespaces/{namespace}/tables GET /v1/{prefix}/namespaces/{namespace}/tables/{table} HEAD /v1/{prefix}/namespaces/{namespace}/tables/{table} POST /v1/{prefix}/namespaces/{namespace}/tables/{table} DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table} The commit endpoint replays the Iceberg `requirements + updates` payload via `MetadataUpdate.applyTo(TableMetadata.Builder)`, then discriminates between snapshot commits (route to `IcebergSnapshotsApiHandler.putIcebergSnapshots`) and metadata-only commits (route to `TablesApiHandler.updateTable`). Server-side metadata authorship and the existing two-stage CAS (path-string version check plus HouseTables `@Version` JPA lock) are preserved unchanged: REST clients reach the same `OpenHouseInternalTableOperations.doCommit` path that OpenHouse's Java client already uses. MVP scope: - single-level namespaces (Iceberg-spec depth > 1 -> 400 BadRequest); rejection chosen over flatten-encoding so a future multi-level migration is purely additive and does not require HDFS path rewrites. - no views, no multi-table transactions, no scan planning, no credential vending, no remote signing. Out-of-spec features are simply not advertised. - depends on iceberg-core 1.5.2 wire types (`UpdateTableRequest`, `LoadTableResponse`, `ConfigResponse`, `ErrorResponse`, ...) already on the Tables Service classpath; no new external dependencies. A `@RestControllerAdvice(basePackages = "com.linkedin.openhouse.tables.rest")` maps OpenHouse exceptions (`NoSuchUserTableException`, `AlreadyExistsException`, `EntityConcurrentModificationException`, ...) to Iceberg's wire-format `ErrorResponse`. The advice is scoped to the new package so OpenHouse's existing exception handler for the native `/v1/databases/...` surface is unaffected. Smoke-tested end-to-end against the `oh-hadoop-spark` docker recipe: a Spark 3.1 spark-sql session configured with stock `org.apache.iceberg.spark.SparkCatalog` + `catalog-impl = org.apache.iceberg.rest.RESTCatalog` (no OpenHouse plugin activated) successfully runs CREATE NAMESPACE, CREATE TABLE, INSERT, SELECT, DROP TABLE round-trip against the Tables Service.

cbb330

lets integrate with the existing work since it has been reviewed and has known good architecture. it is read side only, next set of changes should be write side

then database

This is not in-review

cbb330 previously requested changes May 27, 2026

View reviewed changes

mkuchenbecker changed the title ~~feat(tables): in-process Iceberg REST Catalog adapter~~ (wip) feat(tables): in-process Iceberg REST Catalog adapter May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(wip) feat(tables): in-process Iceberg REST Catalog adapter#607

(wip) feat(tables): in-process Iceberg REST Catalog adapter#607
mkuchenbecker wants to merge 1 commit into
linkedin:mainfrom
mkuchenbecker:mkuchenb/iceberg-rest-adapter

mkuchenbecker commented May 27, 2026

Uh oh!

cbb330 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mkuchenbecker commented May 27, 2026

Summary

Changes

Client-facing API: new endpoints under /iceberg/v1/...

New Features

MVP scope notes (intentional)

Testing Done

Additional Information

Uh oh!

cbb330 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Client-facing API: new endpoints under `/iceberg/v1/...`